NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks

Karia, Rushang; Bramblett, Daniel; Dobhal, Daksh; Srivastava, Siddharth (June 2025, The Thirteenth International Conference on Learning Representations)
Yue, Y; Garg, A; Peng, N; Sha, F; Yu, R (Ed.)
This paper presents AutoEval, a novel benchmark for scaling Large Language Model (LLM) assessment in formal tasks with clear notions of correctness, such as truth maintenance in translation and logical reasoning. AutoEval is the first benchmarking paradigm that offers several key advantages necessary for scaling objective evaluation of LLMs without human labeling: (a) ability to evaluate LLMs of increasing sophistication by auto-generating tasks at different levels of difficulty; (b) auto-generation of ground truth that eliminates dependence on expensive and time-consuming human annotation; (c) the use of automatically generated, randomized datasets that mitigate the ability of successive LLMs to overfit to static datasets used in many contemporary benchmarks. Empirical analysis shows that an LLM's performance on AutoEval is highly indicative of its performance on a diverse array of other benchmarks focusing on translation and reasoning tasks, making it a valuable autonomous evaluation paradigm in settings where hand-curated datasets can be hard to obtain and/or update.
more » « less
Free, publicly-accessible full text available June 1, 2026
Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems

https://doi.org/10.24963/ijcai.2022/435

Karia, Rushang; Srivastava, Siddharth (July 2022, IJCAI)

Reinforcement learning in problems with symbolic state spaces is challenging due to the need for reasoning over long horizons. This paper presents a new approach that utilizes relational abstractions in conjunction with deep learning to learn a generalizable Q-function for such problems. The learned Q-function can be efficiently transferred to related problems that have different object names and object quantities, and thus, entirely different state spaces. We show that the learned, generalized Q-function can be utilized for zero-shot transfer to related problems without an explicit, hand-coded curriculum. Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects.
more » « less
Full Text Available
Learning Generalized Relational Heuristic Networks for Model-Agnostic Planning

Karia, Rushang; Srivastava, Siddharth (February 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Computing goal-directed behavior is essential to designing efficient AI systems. Due to the computational complexity of planning, current approaches rely primarily upon hand-coded symbolic action models and hand-coded heuristic function generators for efficiency. Learned heuristics for such prob- lems have been of limited utility as they are difficult to apply to problems with objects and object quantities that are signif- icantly different from those in the training data. This paper develops a new approach for learning generalized heuristics in the absence of symbolic action models using deep neural networks that utilize an input predicate vocabulary but are agnostic to object names and quantities. It uses an abstract state representation to facilitate data-efficient, generalizable learning. Empirical evaluation on a range of benchmark do- mains shows that in contrast to prior approaches, generalized heuristics computed by this method can be transferred easily to problems with different objects and with object quantities much larger than those in the training data.
more » « less
Full Text Available
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

https://doi.org/10.18653/v1/2022.emnlp-main.340

Wang, Yizhong; Mishra, Swaroop; Alipoormolabashi, Pegah; Kordi, Yeganeh; Mirzaei, Amirreza; Naik, Atharva; Ashok, Arjun; Dhanasekaran, Arut Selvan; Arunkumar, Anjana; Stap, David; et al (January 2022, EMNLP)

Full Text Available

Search for: All records